Hierarchical cluster language modeling with statistical rule extraction for rescoring n-best hypotheses during speech decoding
نویسندگان
چکیده
We propose an unsupervised learning algorithm that learns hierarchical patterns of word sequences in spoken language utterances. It extracts cluster rules from training data based on high n-gram probabilities to cluster words or segment a sentence. Cluster trees, similar to parse trees, are constructed from the learned cluster rules. Through hierarchical clustering we are adding grammatical structure onto the traditional trigram language model. The learned cluster rules are used to improve the n-best utterance hypothesis list which is output by the Sphinx III speech recognizer. Our hierarchical cluster language model is used to rescore and filter these n-best utterance hypotheses. It assigns confidence scores to segments of hypotheses that can be clustered hierarchically with the learned cluster rules. Rescoring the original n-best hypothesis list, which is based on acoustic and trigram language model scores, with our hierarchical cluster language model results in a set of hypotheses with lower word error rate. Our cluster language model was trained on TREC broadcast news data from 1995 and 1996, and tested on the HUB-4 ‘97 development test broadcast news data. Compared to manually created grammar rules, the cluster trees more accurately reflect the speech data since their cluster rules are automatically learned based on empirical n-gram probabilities from the training data, whereas manually written grammar rules can introduce human bias, and are expensive to develop. Prior symbolic knowledge in the form of rules can also be incorporated by simply applying the rules to the training data before the earliest applicable learning iteration. Our algorithm is also able to learn clusters reflecting various styles of data: whether the language is formal, strictly grammatical or loose conversational speech.
منابع مشابه
Direct word graph rescoring using a* search and RNNLM
The usage of Recurrent Neural Network Language Models (RNNLMs) has allowed reaching significant improvements in Automatic Speech Recognition (ASR) tasks. However, to take advantage of their capability for considering long histories, they are usually used to rescore the N-best lists (i.e. it is in practice not possible to use them directly during acoustic trellis search). We propose in this pape...
متن کاملRescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition
Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...
متن کاملEfficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine
Effectively exploiting the resources available on modern multicore and manycore processors for tasks such as large vocabulary continuous speech recognition (LVCSR) is far from trivial. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic...
متن کاملFuzzy class rescoring: a part-of-speech language model
Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...
متن کاملExploiting repair context in interactive error recovery
In current speech applications, facilities to correct recognition errors are limited to either choosing among alternative hypotheses (either by voice or by mouseclick) or respeaking. Information from the context a repair is ignored. We developed a method which improves the accuracy of correcting speech recognition errors interactively by taking into account the context of the repair interaction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998